Skip to content

reprocess failed account events #6571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

escattone
Copy link
Contributor

@escattone escattone commented Mar 18, 2025

mozilla/sumo#2217
mozilla/sumo#2271

Notes

This work introduces a periodic cron task (runs every 4 hours) that gathers all unprocessed account events within the last 24 hours and queues each of them for re-processing. So apart from the Celery retry mechanism on exceptions -- which I reduced from 4 retries to 3 retries, and occurs over the course of about 15 seconds -- the processing of failed (unprocessed) account event tasks will be attempted 5-6 times over the course of the 24-hour period starting from the moment of their creation. Account events that remain in the unprocessed state after 24 hours will no longer be re-processed.

I've also added DMS Snitches for stage and prod, as well as defined their URL's for settings.DMS_REPROCESS_FAILED_ACCOUNT_EVENTS in our GCP secrets.

As part of this work, I made the following changes to each of the tasks within kitsune.users.tasks:

  • Removed the @skip_if_read_only_mode decorators. They were useless even before we removed the Celery workers from the failover clusters -- because they would still "steal" events -- but even more so now that the Celery workers in the failover cluster have been removed.
  • Added the @transaction.atomic decorators, so that all of the DB changes for each task are handled as an atomic chunk. If any one of them fail, all of the others are rolled back. This is to prevent failures from leaving behind a residue of partially-completed work, so that retries always start with a clean slate.
  • When fetching the account event at the beginning of each task, skip any further work if the event no longer exists or it has already been processed. This was done to ensure robust handling in the case of multiple Celery tasks queued for the same event, which could happen given the new periodic cron task that re-processes failed account events.

@escattone escattone requested a review from akatsoulas March 19, 2025 02:31
@escattone escattone force-pushed the reprocess-failed-account-event-tasks branch from ff10628 to 0d4eacf Compare March 24, 2025 21:29
@akatsoulas akatsoulas requested a review from Copilot March 31, 2025 10:05
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a periodic cron task to reprocess failed account events within the past 24 hours. It reduces the Celery retry count, adds transaction atomicity to task processing, and updates tests and the management command accordingly.

  • Adds a new cron job in scripts/cron.py to trigger the reprocessing command every 4 hours.
  • Removes the skip_if_read_only_mode decorators and wraps task functions in transaction.atomic.
  • Updates tests to verify the atomicity of delete, subscription state change, and password change events.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
scripts/cron.py Adds a new scheduled job for reprocessing failed account events.
kitsune/users/tests/test_tasks.py Introduces tests to ensure atomicity for various account event tasks.
kitsune/users/tasks.py Removes read-only mode checks, adds transaction.atomic, and adjusts retry count.
kitsune/users/management/commands/reprocess_failed_account_events.py Updates command help text and argument naming to reflect hours instead of days.
kitsune/settings.py Introduces a new settings variable for DMS reprocessing.

@escattone escattone force-pushed the reprocess-failed-account-event-tasks branch from 0d4eacf to cb917ad Compare March 31, 2025 16:55
@escattone escattone merged commit 09ebaa3 into mozilla:main Apr 1, 2025
2 checks passed
@escattone escattone deleted the reprocess-failed-account-event-tasks branch April 1, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants